A cross sectional study was conducted, where adults of 18–83 years were grouped into young (18–39 years) , middle aged (40–59 years) and elderly (>60 years). BF% was estimated from bioelectrical impedance analysis. Pearsons’ correlation coefficient(r) was calculated to see the relationship between BMI-BF% in the different age groups. Multiple regression analysis was performed to determine the effect of body density in the relationship and polynomial regression was carried out to see its linearity. The relationships between age-BMI, age-BF % were separately assessed.
Relation with density A person’s body density depends on how much fat and fat-free mass he carries. Fat is found under the skin, around the internal organs, as an essential part of the central nervous system, as part of the structure of some internal organs and inside the bone marrow. The density of fat is fairly consistent at 0.91 kilogram per liter and is less dense that most of your fat-free mass. When you know your total body density, you still don’t know what percentage of it is fat. But you can plug your body density into the following equation to get a general idea: percent body fat = (495 / Body Density) - 450.
Relation with age BMI progressively increases with age in women and plateaued between 40 and 70 years in men. At a fixed BMI, body fat mass increased with age (1.9 kg/decade), as did % fat (1.1–1.4% per decade). The relationship between BMI and % fat is found to be curvilinear (quadratic) rather than linear, with a weaker association at lower BMI.
Null Hypothesis: There is no relationship between the percentage of body fat for an individual and the body density.
..
library(plotly)
library(stats)
library(tidyr)
library(DT)Returns: Dataframe of the dataset
setwd("~/Desktop/HS631_Bio_Informatics/Repo/prediction_of_bodyfat-team-2/data")
bodyfat<- read.csv2("bodyfat_1.csv", header=TRUE, sep=",")
head(as.data.frame(bodyfat),n = 10)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59 | 37.3 | 21.9 | 32 | 27.4 | 17.1 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 |
| 1.0414 | 25.3 | 22 | 154 | 66.25 | 34 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24 | 28.8 | 25.2 | 16.6 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 |
| 1.034 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100 | 101.9 | 63.2 | 42.2 | 24 | 32.2 | 27.7 | 17.7 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39 | 104.5 | 94.4 | 107.8 | 66 | 42 | 25.6 | 35.7 | 30.6 | 18.8 |
| 1.0549 | 19.2 | 26 | 181 | 69.75 | 36.4 | 105.1 | 90.7 | 100.3 | 58.4 | 38.3 | 22.9 | 31.9 | 27.8 | 17.7 |
| 1.0704 | 12.4 | 25 | 176 | 72.5 | 37.8 | 99.6 | 88.5 | 97.1 | 60 | 39.4 | 23.2 | 30.5 | 29 | 18.8 |
| 1.09 | 4.1 | 25 | 191 | 74 | 38.1 | 100.9 | 82.5 | 99.9 | 62.9 | 38.3 | 23.8 | 35.9 | 31.1 | 18.2 |
| 1.0722 | 11.7 | 23 | 198.25 | 73.5 | 42.1 | 99.6 | 88.6 | 104.1 | 63.1 | 41.7 | 25 | 35.6 | 30 | 19.2 |
head(bodyfat, 50)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59 | 37.3 | 21.9 | 32 | 27.4 | 17.1 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 |
| 1.0414 | 25.3 | 22 | 154 | 66.25 | 34 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24 | 28.8 | 25.2 | 16.6 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 |
| 1.034 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100 | 101.9 | 63.2 | 42.2 | 24 | 32.2 | 27.7 | 17.7 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39 | 104.5 | 94.4 | 107.8 | 66 | 42 | 25.6 | 35.7 | 30.6 | 18.8 |
| 1.0549 | 19.2 | 26 | 181 | 69.75 | 36.4 | 105.1 | 90.7 | 100.3 | 58.4 | 38.3 | 22.9 | 31.9 | 27.8 | 17.7 |
| 1.0704 | 12.4 | 25 | 176 | 72.5 | 37.8 | 99.6 | 88.5 | 97.1 | 60 | 39.4 | 23.2 | 30.5 | 29 | 18.8 |
| 1.09 | 4.1 | 25 | 191 | 74 | 38.1 | 100.9 | 82.5 | 99.9 | 62.9 | 38.3 | 23.8 | 35.9 | 31.1 | 18.2 |
| 1.0722 | 11.7 | 23 | 198.25 | 73.5 | 42.1 | 99.6 | 88.6 | 104.1 | 63.1 | 41.7 | 25 | 35.6 | 30 | 19.2 |
| 1.083 | 7.1 | 26 | 186.25 | 74.5 | 38.5 | 101.5 | 83.6 | 98.2 | 59.7 | 39.7 | 25.2 | 32.8 | 29.4 | 18.5 |
| 1.0812 | 7.8 | 27 | 216 | 76 | 39.4 | 103.6 | 90.9 | 107.7 | 66.2 | 39.2 | 25.9 | 37.2 | 30.2 | 19 |
| 1.0513 | 20.8 | 32 | 180.5 | 69.5 | 38.4 | 102 | 91.6 | 103.9 | 63.4 | 38.3 | 21.5 | 32.5 | 28.6 | 17.7 |
| 1.0505 | 21.2 | 30 | 205.25 | 71.25 | 39.4 | 104.1 | 101.8 | 108.6 | 66 | 41.5 | 23.7 | 36.9 | 31.6 | 18.8 |
| 1.0484 | 22.1 | 35 | 187.75 | 69.5 | 40.5 | 101.3 | 96.4 | 100.1 | 69 | 39 | 23.1 | 36.1 | 30.5 | 18.2 |
| 1.0512 | 20.9 | 35 | 162.75 | 66 | 36.4 | 99.1 | 92.8 | 99.2 | 63.1 | 38.7 | 21.7 | 31.1 | 26.4 | 16.9 |
| 1.0333 | 29 | 34 | 195.75 | 71 | 38.9 | 101.9 | 96.4 | 105.2 | 64.8 | 40.8 | 23.1 | 36.2 | 30.8 | 17.3 |
| 1.0468 | 22.9 | 32 | 209.25 | 71 | 42.1 | 107.6 | 97.5 | 107 | 66.9 | 40 | 24.4 | 38.2 | 31.6 | 19.3 |
| 1.0622 | 16 | 28 | 183.75 | 67.75 | 38 | 106.8 | 89.6 | 102.4 | 64.2 | 38.7 | 22.9 | 37.2 | 30.5 | 18.5 |
| 1.061 | 16.5 | 33 | 211.75 | 73.5 | 40 | 106.2 | 100.5 | 109 | 65.8 | 40.6 | 24 | 37.1 | 30.1 | 18.2 |
| 1.0551 | 19.1 | 28 | 179 | 68 | 39.1 | 103.3 | 95.9 | 104.9 | 63.5 | 38 | 22.1 | 32.5 | 30.3 | 18.4 |
| 1.064 | 15.2 | 28 | 200.5 | 69.75 | 41.3 | 111.4 | 98.8 | 104.8 | 63.4 | 40.6 | 24.6 | 33 | 32.8 | 19.9 |
| 1.0631 | 15.6 | 31 | 140.25 | 68.25 | 33.9 | 86 | 76.4 | 94.6 | 57.4 | 35.3 | 22.2 | 27.9 | 25.9 | 16.7 |
| 1.0584 | 17.7 | 32 | 148.75 | 70 | 35.5 | 86.7 | 80 | 93.4 | 54.9 | 36.2 | 22.1 | 29.8 | 26.7 | 17.1 |
| 1.0668 | 14 | 28 | 151.25 | 67.75 | 34.5 | 90.2 | 76.3 | 95.8 | 58.4 | 35.5 | 22.9 | 31.1 | 28 | 17.6 |
| 1.0911 | 3.7 | 27 | 159.25 | 71.5 | 35.7 | 89.6 | 79.7 | 96.5 | 55 | 36.7 | 22.5 | 29.9 | 28.2 | 17.7 |
| 1.0811 | 7.9 | 34 | 131.5 | 67.5 | 36.2 | 88.6 | 74.6 | 85.3 | 51.7 | 34.7 | 21.4 | 28.7 | 27 | 16.5 |
| 1.0468 | 22.9 | 31 | 148 | 67.5 | 38.8 | 97.4 | 88.7 | 94.7 | 57.5 | 36 | 21 | 29.2 | 26.6 | 17 |
| 1.091 | 3.7 | 27 | 133.25 | 64.75 | 36.4 | 93.5 | 73.9 | 88.5 | 50.1 | 34.5 | 21.3 | 30.5 | 27.9 | 17.2 |
| 1.079 | 8.8 | 29 | 160.75 | 69 | 36.7 | 97.4 | 83.5 | 98.7 | 58.9 | 35.3 | 22.6 | 30.1 | 26.7 | 17.6 |
| 1.0716 | 11.9 | 32 | 182 | 73.75 | 38.7 | 100.5 | 88.7 | 99.8 | 57.5 | 38.7 | 33.9 | 32.5 | 27.7 | 18.4 |
| 1.0862 | 5.7 | 29 | 160.25 | 71.25 | 37.3 | 93.5 | 84.5 | 100.6 | 58.5 | 38.8 | 21.5 | 30.1 | 26.4 | 17.9 |
| 1.0719 | 11.8 | 27 | 168 | 71.25 | 38.1 | 93 | 79.1 | 94.5 | 57.3 | 36.2 | 24.5 | 29 | 30 | 18.8 |
| 1.0502 | 21.3 | 41 | 218.5 | 71 | 39.8 | 111.7 | 100.5 | 108.3 | 67.1 | 44.2 | 25.2 | 37.5 | 31.5 | 18.7 |
| 1.0263 | 32.3 | 41 | 247.25 | 73.5 | 42.1 | 117 | 115.6 | 116.1 | 71.2 | 43.3 | 26.3 | 37.3 | 31.7 | 19.7 |
| 1.0101 | 40.1 | 49 | 191.75 | 65 | 38.4 | 118.5 | 113.1 | 113.8 | 61.9 | 38.3 | 21.9 | 32 | 29.8 | 17 |
| 1.0438 | 24.2 | 40 | 202.25 | 70 | 38.5 | 106.5 | 100.9 | 106.2 | 63.5 | 39.9 | 22.6 | 35.1 | 30.6 | 19 |
| 1.0346 | 28.4 | 50 | 196.75 | 68.25 | 42.1 | 105.6 | 98.8 | 104.8 | 66 | 41.5 | 24.7 | 33.2 | 30.5 | 19.4 |
| 1.0202 | 35.2 | 46 | 363.15 | 72.25 | 51.2 | 136.2 | 148.1 | 147.7 | 87.3 | 49.1 | 29.6 | 45 | 29 | 21.4 |
| 1.0258 | 32.6 | 50 | 203 | 67 | 40.2 | 114.8 | 108.1 | 102.5 | 61.3 | 41.1 | 24.7 | 34.1 | 31 | 18.3 |
| 1.0217 | 34.5 | 45 | 262.75 | 68.75 | 43.2 | 128.3 | 126.2 | 125.6 | 72.5 | 39.6 | 26.6 | 36.4 | 32.7 | 21.4 |
| 1.025 | 32.9 | 44 | 205 | 29.5 | 36.6 | 106 | 104.3 | 115.5 | 70.6 | 42.5 | 23.7 | 33.6 | 28.7 | 17.4 |
| 1.0279 | 31.6 | 48 | 217 | 70 | 37.3 | 113.3 | 111.2 | 114.1 | 67.7 | 40.9 | 25 | 36.7 | 29.8 | 18.4 |
| 1.0269 | 32 | 41 | 212 | 71.5 | 41.5 | 106.6 | 104.3 | 106 | 65 | 40.2 | 23 | 35.8 | 31.5 | 18.8 |
| 1.0814 | 7.7 | 39 | 125.25 | 68 | 31.5 | 85.1 | 76 | 88.2 | 50 | 34.7 | 21 | 26.1 | 23.1 | 16.1 |
| 1.067 | 13.9 | 43 | 164.25 | 73.25 | 35.7 | 96.6 | 81.5 | 97.2 | 58.4 | 38.2 | 23.4 | 29.7 | 27.4 | 18.3 |
| 1.0742 | 10.8 | 40 | 133.5 | 67.5 | 33.6 | 88.2 | 73.7 | 88.5 | 53.3 | 34.5 | 22.5 | 27.9 | 26.2 | 17.3 |
| 1.0665 | 5.6 | 39 | 148.5 | 71.25 | 34.6 | 89.8 | 79.5 | 92.7 | 52.7 | 37.5 | 21.9 | 28.8 | 26.8 | 17.9 |
| 1.0678 | 13.6 | 45 | 135.75 | 68.5 | 32.8 | 92.3 | 83.4 | 90.4 | 52 | 35.8 | 20.6 | 28.8 | 25.5 | 16.3 |
| 1.0903 | 4 | 47 | 127.5 | 66.75 | 34 | 83.4 | 70.4 | 87.2 | 50.6 | 34.4 | 21.9 | 26.8 | 25.8 | 16.8 |
datatable(head(bodyfat,50))___Returns: Find structure of the dataset
getwd()## [1] "/Users/Yutachen/Desktop/Project_presentation_template"
str(bodyfat)## 'data.frame': 1008 obs. of 15 variables:
## $ Density: chr "1.0708" "1.0853" "1.0414" "1.0751" ...
## $ BodyFat: chr "12.3" "6.1" "25.3" "10.4" ...
## $ Age : int 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : chr "154.25" "173.25" "154" "184.75" ...
## $ Height : chr "67.75" "72.25" "66.25" "72.25" ...
## $ Neck : chr "36.2" "38.5" "34" "37.4" ...
## $ Chest : chr "93.1" "93.6" "95.8" "101.8" ...
## $ Abdomen: chr "85.2" "83" "87.9" "86.4" ...
## $ Hip : chr "94.5" "98.7" "99.2" "101.2" ...
## $ Thigh : chr "59" "58.7" "59.6" "60.1" ...
## $ Knee : chr "37.3" "37.3" "38.9" "37.3" ...
## $ Ankle : chr "21.9" "23.4" "24" "22.8" ...
## $ Biceps : chr "32" "30.5" "28.8" "32.4" ...
## $ Forearm: chr "27.4" "28.9" "25.2" "29.4" ...
## $ Wrist : chr "17.1" "18.2" "16.6" "18.2" ...
dt_unique <- unique(bodyfat)
nrow(dt_unique)## [1] 1008
rm(dt_unique)
bodyfat_1 <- bodyfatRestructure the datatypes
bodyfat_1$Density<- as.numeric(as.character(bodyfat_1$Density))
bodyfat_1$BodyFat<- as.numeric(as.character(bodyfat_1$BodyFat))
bodyfat_1$Age<- as.numeric(as.character(bodyfat_1$Age))
bodyfat_1$Weight<-as.numeric(as.character(bodyfat_1$Weight))
bodyfat_1$Height<-as.numeric(as.character (bodyfat_1$Height))
bodyfat_1$Neck<-as.numeric(as.character(bodyfat_1$Neck))
bodyfat_1$Chest<-as.numeric(as.character(bodyfat_1$Chest))
bodyfat_1$Abdomen<-as.numeric(as.character(bodyfat_1$Abdomen))
bodyfat_1$Hip<-as.numeric(as.character(bodyfat_1$Hip))
bodyfat_1$Thigh<-as.numeric(as.character(bodyfat_1$Thigh))
bodyfat_1$Knee<-as.numeric(as.character(bodyfat_1$Knee))
bodyfat_1$Ankle<-as.numeric(as.character(bodyfat_1$Ankle))
bodyfat_1$Biceps<-as.numeric(as.character(bodyfat_1$Biceps))
bodyfat_1$Forearm<-as.numeric(as.character(bodyfat_1$Forearm))
bodyfat_1$Wrist<-as.numeric(as.character(bodyfat_1$Wrist))
str(bodyfat_1)## 'data.frame': 1008 obs. of 15 variables:
## $ Density: num 1.07 1.09 1.04 1.08 1.03 ...
## $ BodyFat: num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
## $ Age : num 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : num 154 173 154 185 184 ...
## $ Height : num 67.8 72.2 66.2 72.2 71.2 ...
## $ Neck : num 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
## $ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
## $ Abdomen: num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
## $ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
## $ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
## $ Knee : num 37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
## $ Ankle : num 21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
## $ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
## $ Forearm: num 27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
## $ Wrist : num 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
head(bodyfat,50)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59 | 37.3 | 21.9 | 32 | 27.4 | 17.1 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 |
| 1.0414 | 25.3 | 22 | 154 | 66.25 | 34 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24 | 28.8 | 25.2 | 16.6 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 |
| 1.034 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100 | 101.9 | 63.2 | 42.2 | 24 | 32.2 | 27.7 | 17.7 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39 | 104.5 | 94.4 | 107.8 | 66 | 42 | 25.6 | 35.7 | 30.6 | 18.8 |
| 1.0549 | 19.2 | 26 | 181 | 69.75 | 36.4 | 105.1 | 90.7 | 100.3 | 58.4 | 38.3 | 22.9 | 31.9 | 27.8 | 17.7 |
| 1.0704 | 12.4 | 25 | 176 | 72.5 | 37.8 | 99.6 | 88.5 | 97.1 | 60 | 39.4 | 23.2 | 30.5 | 29 | 18.8 |
| 1.09 | 4.1 | 25 | 191 | 74 | 38.1 | 100.9 | 82.5 | 99.9 | 62.9 | 38.3 | 23.8 | 35.9 | 31.1 | 18.2 |
| 1.0722 | 11.7 | 23 | 198.25 | 73.5 | 42.1 | 99.6 | 88.6 | 104.1 | 63.1 | 41.7 | 25 | 35.6 | 30 | 19.2 |
| 1.083 | 7.1 | 26 | 186.25 | 74.5 | 38.5 | 101.5 | 83.6 | 98.2 | 59.7 | 39.7 | 25.2 | 32.8 | 29.4 | 18.5 |
| 1.0812 | 7.8 | 27 | 216 | 76 | 39.4 | 103.6 | 90.9 | 107.7 | 66.2 | 39.2 | 25.9 | 37.2 | 30.2 | 19 |
| 1.0513 | 20.8 | 32 | 180.5 | 69.5 | 38.4 | 102 | 91.6 | 103.9 | 63.4 | 38.3 | 21.5 | 32.5 | 28.6 | 17.7 |
| 1.0505 | 21.2 | 30 | 205.25 | 71.25 | 39.4 | 104.1 | 101.8 | 108.6 | 66 | 41.5 | 23.7 | 36.9 | 31.6 | 18.8 |
| 1.0484 | 22.1 | 35 | 187.75 | 69.5 | 40.5 | 101.3 | 96.4 | 100.1 | 69 | 39 | 23.1 | 36.1 | 30.5 | 18.2 |
| 1.0512 | 20.9 | 35 | 162.75 | 66 | 36.4 | 99.1 | 92.8 | 99.2 | 63.1 | 38.7 | 21.7 | 31.1 | 26.4 | 16.9 |
| 1.0333 | 29 | 34 | 195.75 | 71 | 38.9 | 101.9 | 96.4 | 105.2 | 64.8 | 40.8 | 23.1 | 36.2 | 30.8 | 17.3 |
| 1.0468 | 22.9 | 32 | 209.25 | 71 | 42.1 | 107.6 | 97.5 | 107 | 66.9 | 40 | 24.4 | 38.2 | 31.6 | 19.3 |
| 1.0622 | 16 | 28 | 183.75 | 67.75 | 38 | 106.8 | 89.6 | 102.4 | 64.2 | 38.7 | 22.9 | 37.2 | 30.5 | 18.5 |
| 1.061 | 16.5 | 33 | 211.75 | 73.5 | 40 | 106.2 | 100.5 | 109 | 65.8 | 40.6 | 24 | 37.1 | 30.1 | 18.2 |
| 1.0551 | 19.1 | 28 | 179 | 68 | 39.1 | 103.3 | 95.9 | 104.9 | 63.5 | 38 | 22.1 | 32.5 | 30.3 | 18.4 |
| 1.064 | 15.2 | 28 | 200.5 | 69.75 | 41.3 | 111.4 | 98.8 | 104.8 | 63.4 | 40.6 | 24.6 | 33 | 32.8 | 19.9 |
| 1.0631 | 15.6 | 31 | 140.25 | 68.25 | 33.9 | 86 | 76.4 | 94.6 | 57.4 | 35.3 | 22.2 | 27.9 | 25.9 | 16.7 |
| 1.0584 | 17.7 | 32 | 148.75 | 70 | 35.5 | 86.7 | 80 | 93.4 | 54.9 | 36.2 | 22.1 | 29.8 | 26.7 | 17.1 |
| 1.0668 | 14 | 28 | 151.25 | 67.75 | 34.5 | 90.2 | 76.3 | 95.8 | 58.4 | 35.5 | 22.9 | 31.1 | 28 | 17.6 |
| 1.0911 | 3.7 | 27 | 159.25 | 71.5 | 35.7 | 89.6 | 79.7 | 96.5 | 55 | 36.7 | 22.5 | 29.9 | 28.2 | 17.7 |
| 1.0811 | 7.9 | 34 | 131.5 | 67.5 | 36.2 | 88.6 | 74.6 | 85.3 | 51.7 | 34.7 | 21.4 | 28.7 | 27 | 16.5 |
| 1.0468 | 22.9 | 31 | 148 | 67.5 | 38.8 | 97.4 | 88.7 | 94.7 | 57.5 | 36 | 21 | 29.2 | 26.6 | 17 |
| 1.091 | 3.7 | 27 | 133.25 | 64.75 | 36.4 | 93.5 | 73.9 | 88.5 | 50.1 | 34.5 | 21.3 | 30.5 | 27.9 | 17.2 |
| 1.079 | 8.8 | 29 | 160.75 | 69 | 36.7 | 97.4 | 83.5 | 98.7 | 58.9 | 35.3 | 22.6 | 30.1 | 26.7 | 17.6 |
| 1.0716 | 11.9 | 32 | 182 | 73.75 | 38.7 | 100.5 | 88.7 | 99.8 | 57.5 | 38.7 | 33.9 | 32.5 | 27.7 | 18.4 |
| 1.0862 | 5.7 | 29 | 160.25 | 71.25 | 37.3 | 93.5 | 84.5 | 100.6 | 58.5 | 38.8 | 21.5 | 30.1 | 26.4 | 17.9 |
| 1.0719 | 11.8 | 27 | 168 | 71.25 | 38.1 | 93 | 79.1 | 94.5 | 57.3 | 36.2 | 24.5 | 29 | 30 | 18.8 |
| 1.0502 | 21.3 | 41 | 218.5 | 71 | 39.8 | 111.7 | 100.5 | 108.3 | 67.1 | 44.2 | 25.2 | 37.5 | 31.5 | 18.7 |
| 1.0263 | 32.3 | 41 | 247.25 | 73.5 | 42.1 | 117 | 115.6 | 116.1 | 71.2 | 43.3 | 26.3 | 37.3 | 31.7 | 19.7 |
| 1.0101 | 40.1 | 49 | 191.75 | 65 | 38.4 | 118.5 | 113.1 | 113.8 | 61.9 | 38.3 | 21.9 | 32 | 29.8 | 17 |
| 1.0438 | 24.2 | 40 | 202.25 | 70 | 38.5 | 106.5 | 100.9 | 106.2 | 63.5 | 39.9 | 22.6 | 35.1 | 30.6 | 19 |
| 1.0346 | 28.4 | 50 | 196.75 | 68.25 | 42.1 | 105.6 | 98.8 | 104.8 | 66 | 41.5 | 24.7 | 33.2 | 30.5 | 19.4 |
| 1.0202 | 35.2 | 46 | 363.15 | 72.25 | 51.2 | 136.2 | 148.1 | 147.7 | 87.3 | 49.1 | 29.6 | 45 | 29 | 21.4 |
| 1.0258 | 32.6 | 50 | 203 | 67 | 40.2 | 114.8 | 108.1 | 102.5 | 61.3 | 41.1 | 24.7 | 34.1 | 31 | 18.3 |
| 1.0217 | 34.5 | 45 | 262.75 | 68.75 | 43.2 | 128.3 | 126.2 | 125.6 | 72.5 | 39.6 | 26.6 | 36.4 | 32.7 | 21.4 |
| 1.025 | 32.9 | 44 | 205 | 29.5 | 36.6 | 106 | 104.3 | 115.5 | 70.6 | 42.5 | 23.7 | 33.6 | 28.7 | 17.4 |
| 1.0279 | 31.6 | 48 | 217 | 70 | 37.3 | 113.3 | 111.2 | 114.1 | 67.7 | 40.9 | 25 | 36.7 | 29.8 | 18.4 |
| 1.0269 | 32 | 41 | 212 | 71.5 | 41.5 | 106.6 | 104.3 | 106 | 65 | 40.2 | 23 | 35.8 | 31.5 | 18.8 |
| 1.0814 | 7.7 | 39 | 125.25 | 68 | 31.5 | 85.1 | 76 | 88.2 | 50 | 34.7 | 21 | 26.1 | 23.1 | 16.1 |
| 1.067 | 13.9 | 43 | 164.25 | 73.25 | 35.7 | 96.6 | 81.5 | 97.2 | 58.4 | 38.2 | 23.4 | 29.7 | 27.4 | 18.3 |
| 1.0742 | 10.8 | 40 | 133.5 | 67.5 | 33.6 | 88.2 | 73.7 | 88.5 | 53.3 | 34.5 | 22.5 | 27.9 | 26.2 | 17.3 |
| 1.0665 | 5.6 | 39 | 148.5 | 71.25 | 34.6 | 89.8 | 79.5 | 92.7 | 52.7 | 37.5 | 21.9 | 28.8 | 26.8 | 17.9 |
| 1.0678 | 13.6 | 45 | 135.75 | 68.5 | 32.8 | 92.3 | 83.4 | 90.4 | 52 | 35.8 | 20.6 | 28.8 | 25.5 | 16.3 |
| 1.0903 | 4 | 47 | 127.5 | 66.75 | 34 | 83.4 | 70.4 | 87.2 | 50.6 | 34.4 | 21.9 | 26.8 | 25.8 | 16.8 |
___Returns: Check for NA`S
any(is.na(bodyfat_1))## [1] FALSE
Density (create new categorical variable for Density and bodyfat)
summary(bodyfat_1$BodyFat)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 14.20 20.80 20.75 26.80 50.70
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=0 & bodyfat_1$BodyFat<=14.20]="Low body fat"
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=14.21 & bodyfat_1$BodyFat<=26.80]="Medium body fat"
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=26.81 & bodyfat_1$BodyFat<=50.70]="High body fat"
table(bodyfat_1$fat_value)##
## High body fat Low body fat Medium body fat
## 250 253 505
any(is.na(bodyfat_1$fat_value_cat))## [1] FALSE
summary(bodyfat_1$Density)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.995 1.046 1.059 1.060 1.074 1.118
bodyfat_1$Density_cat[bodyfat_1$Density>=0.995 & bodyfat_1$Density<=1.046]="Low density"
bodyfat_1$Density_cat[bodyfat_1$Density>=1.0461 & bodyfat_1$Density<=1.074]="Medium density"
bodyfat_1$Density_cat[bodyfat_1$Density>=1.0741 & bodyfat_1$Density<=1.1179]="High density"
table(bodyfat_1$Density_cat)##
## High density Low density Medium density
## 260 245 503
any(is.na(bodyfat_1$Density_cat))## [1] FALSE
Visualize
bodyfat_1$Density_cat<-factor(bodyfat_1$Density_cat, ordered = TRUE, levels = c("Low density","Medium density","High density"))
bodyfat_1$fat_value_cat<-factor(bodyfat_1$fat_value_cat, ordered = TRUE, levels = c("Low body fat","Medium body fat","High body fat"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Density_cat))+geom_bar(position = position_dodge(preserve = "single")) #Univariate Distribution for Weight variable #Normally distributed #Positively skewed #Leptokurtic
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ readr 2.0.1 ✓ stringr 1.4.0
## ✓ purrr 0.3.4 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
plot(density(bodyfat_1$Density))hist(bodyfat_1$Density)summary(bodyfat_1$Density)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.995 1.046 1.059 1.060 1.074 1.118
skew(bodyfat_1$Density)## [1] -0.01920148
kurtosi(bodyfat_1$Density)## [1] -0.3155286
Check for correlation #Negative correlated
cor.test(bodyfat_1$BodyFat, bodyfat_1$Density)##
## Pearson's product-moment correlation
##
## data: bodyfat_1$BodyFat and bodyfat_1$Density
## t = -89.992, df = 1006, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.9495837 -0.9358921
## sample estimates:
## cor
## -0.9431365
Age (create new categorical variable for Age)
summary(bodyfat_1$Age)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 38.00 45.00 46.38 55.00 84.00
bodyfat_1$age_cat[bodyfat_1$Age>=22 & bodyfat_1$Age<=38]="Young"
bodyfat_1$age_cat[bodyfat_1$Age>=39 & bodyfat_1$Age<=55]="Middle age"
bodyfat_1$age_cat[bodyfat_1$Age>=56 & bodyfat_1$Age<=84]="Old"
table(bodyfat_1$age_cat)##
## Middle age Old Young
## 506 236 266
Visualize
bodyfat_1$age_cat<-factor(bodyfat_1$age_cat, ordered = TRUE, levels = c("Young", "Middle age","Old"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=age_cat))+geom_bar(position = position_dodge(preserve = "single")) getwd()## [1] "/Users/Yutachen/Desktop/Project_presentation_template"
#Univariate Distribution for Age variable #Normally distributed #Positively skewed #Leptokurtic
plot(density(bodyfat_1$Age))hist(bodyfat_1$Age)summary(bodyfat_1$Age)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 38.00 45.00 46.38 55.00 84.00
skew(bodyfat_1$Age)## [1] 0.2781083
kurtosi(bodyfat_1$Age)## [1] -0.4303825
Check for correlation #POSITIVE correlated
cor(bodyfat_1$BodyFat, bodyfat_1$Age)## [1] 0.299733
Weight (create new categorical variable for Weight)
summary(bodyfat_1$Weight)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 118.5 160.5 178.0 180.9 199.0 367.1
bodyfat_1$weight_cat[bodyfat_1$Weight>=118.5 & bodyfat_1$Weight<=160.5]="Low weight"
bodyfat_1$weight_cat[bodyfat_1$Weight>=160.5 & bodyfat_1$Weight<=199]="Midium weight"
bodyfat_1$weight_cat[bodyfat_1$Weight>=199 & bodyfat_1$Weight<=368.1]="High weight"
table(bodyfat_1$weight_cat)##
## High weight Low weight Midium weight
## 255 251 502
Visualize
bodyfat_1$weight_cat<-factor(bodyfat_1$weight_cat, ordered = TRUE, levels = c("Low weight", "Midium weight","High weight"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=weight_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Weight variable #Normally distributed #Positively skewed #Leptokurtic
plot(density(bodyfat_1$Weight))hist(bodyfat_1$Weight)summary(bodyfat_1$Weight)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 118.5 160.5 178.0 180.9 199.0 367.1
skew(bodyfat_1$Weight)## [1] 1.192135
kurtosi(bodyfat_1$Weight)## [1] 5.101895
Check for correlation #POSITIVE correlated
cor(bodyfat_1$BodyFat, bodyfat_1$Weight)## [1] 0.6122873
Height
summary(bodyfat_1$Height)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.50 70.00 72.00 71.90 74.00 81.25
bodyfat_1$height_cat[bodyfat_1$Height>=29.5 & bodyfat_1$Height<=70]="short"
bodyfat_1$height_cat[bodyfat_1$Height>=70.01 &bodyfat_1$Height<=74]="Midium"
bodyfat_1$height_cat[bodyfat_1$Height>=74.01 & bodyfat_1$Height<=81.25]="tall"
table(bodyfat_1$height_cat)##
## Midium short tall
## 484 275 249
any(is.na(bodyfat_1$height_cat))## [1] FALSE
visualize
bodyfat_1$height_cat<-factor(bodyfat_1$height_cat, ordered = TRUE, levels = c("short", "Midium","tall"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=height_cat))+geom_bar(position = position_dodge(preserve = "single")) #Univariate Distribution for Height variable #Normally distributed #Negative skewed #Leptokurtic
plot(density(bodyfat_1$Height))hist(bodyfat_1$Height)summary(bodyfat_1$Height)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.50 70.00 72.00 71.90 74.00 81.25
skew(bodyfat_1$Height)## [1] -4.527984
kurtosi(bodyfat_1$Height)## [1] 46.6588
Check for correlation #Negative correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Height)## [1] -0.03936278
Neck(create new categorical variable for Neck)
summary(bodyfat_1$Neck)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 31.10 37.90 39.80 39.74 41.50 54.70
bodyfat_1$neck_cat[bodyfat_1$Neck>=31.1 & bodyfat_1$Neck<=37.9]="Low"
bodyfat_1$neck_cat[bodyfat_1$Neck>=37.91 & bodyfat_1$Neck<=41.5]="Midium"
bodyfat_1$neck_cat[bodyfat_1$Neck>=41.51 & bodyfat_1$Neck<=54.7]="High"
table(bodyfat_1$neck_cat)##
## High Low Midium
## 247 258 503
any(is.na(bodyfat_1$neck_cat))## [1] FALSE
visualization
bodyfat_1$age_cat<-factor(bodyfat_1$neck_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=neck_cat))+geom_bar(position = position_dodge(preserve = "single")) #Univariate Distribution for Neck variable #Normally distributed #Positive skewed #Leptokurtic
plot(density(bodyfat_1$Neck))hist(bodyfat_1$Neck)summary(bodyfat_1$Neck)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 31.10 37.90 39.80 39.74 41.50 54.70
skew(bodyfat_1$Neck)## [1] 0.3853075
kurtosi(bodyfat_1$Neck)## [1] 1.593368
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Neck)## [1] 0.4905752
Chest (create new categorical variable for chest)
summary(bodyfat_1$Chest)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 79.30 96.47 101.75 102.57 107.33 139.70
bodyfat_1$chest_cat[bodyfat_1$Chest>=79.3 & bodyfat_1$Chest<=96.47]="Low"
bodyfat_1$chest_cat[bodyfat_1$Chest>=96.48 & bodyfat_1$Chest<=107.33]="Medium"
bodyfat_1$chest_cat[bodyfat_1$Chest>=107.34 & bodyfat_1$Chest<=140]="High"
any(is.na(bodyfat_1$chest_cat))## [1] FALSE
visualization
bodyfat_1$chest_cat<-factor(bodyfat_1$chest_cat, ordered = TRUE)
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=chest_cat))+geom_bar(position = position_dodge(preserve = "single")) #Univariate Distribution for chest variable #Normally distributed #Positive skewed #Leptokurtic
plot(density(bodyfat_1$Chest))hist(bodyfat_1$Chest)summary(bodyfat_1$Chest)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 79.30 96.47 101.75 102.57 107.33 139.70
skew(bodyfat_1$Chest)## [1] 0.6546895
kurtosi(bodyfat_1$Chest)## [1] 0.8955072
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Chest)## [1] 0.7071347
Abdomen (create new categorical variable for Abdomen)
summary(bodyfat_1$Abdomen)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 69.40 86.70 93.10 94.36 101.20 151.70
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=69.40 & bodyfat_1$Abdomen<=86.70]="Low"
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=86.71 & bodyfat_1$Abdomen<=101.2]="Midium"
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=101.21 & bodyfat_1$Abdomen<=151.7]="High"
table(bodyfat_1$abdomen_cat)##
## High Low Midium
## 249 253 506
any(is.na(bodyfat_1$neck_cat))## [1] FALSE
visualization
bodyfat_1$abdomen_cat<-factor(bodyfat_1$abdomen_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=abdomen_cat))+geom_bar(position = position_dodge(preserve = "single")) #Univariate Distribution for abdomen variable #Normally distributed #Positive skewed #Leptokurtic
plot(density(bodyfat_1$Abdomen))hist(bodyfat_1$Abdomen)summary(bodyfat_1$Abdomen)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 69.40 86.70 93.10 94.36 101.20 151.70
skew(bodyfat_1$Abdomen)## [1] 0.8148106
kurtosi(bodyfat_1$Abdomen)## [1] 2.109904
Check for correlation #positive
cor(bodyfat_1$BodyFat, bodyfat_1$Abdomen)## [1] 0.815291
Hip (create new categorical variable for Hip) <<<<<<< HEAD
summary(bodyfat_1$Hip)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 85.00 97.28 101.20 101.90 105.60 151.70
bodyfat_1$Hip_cat[bodyfat_1$Hip>=80 & bodyfat_1$Hip<=97.28]="Small"
bodyfat_1$Hip_cat[bodyfat_1$Hip>=97.29 & bodyfat_1$Hip<=105.60]="Midium"
bodyfat_1$Hip_cat[bodyfat_1$Hip>=105.61 & bodyfat_1$Hip<=152]="Large"
table(bodyfat_1$Hip_cat)##
## Large Midium Small
## 251 505 252
Visualize
bodyfat_1$Hip_cat<-factor(bodyfat_1$Hip_cat, ordered = TRUE, levels = c("Small", "Midium","Large"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Hip_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Hip variable #Normally distributed #Positively skewed #Leptokurtic
library(tidyverse)
library(psych)
plot(density(bodyfat_1$Hip))hist(bodyfat_1$Hip)summary(bodyfat_1$Hip)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 85.00 97.28 101.20 101.90 105.60 151.70
skew(bodyfat_1$Hip)## [1] 1.397922
kurtosi(bodyfat_1$Hip)## [1] 6.707908
Check for correlation #POSITIVE correlated
cor(bodyfat_1$BodyFat, bodyfat_1$Hip)## [1] 0.6310884
Thigh(create new categorical variable for Thigh)
summary(bodyfat_1$Thigh)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 47.20 57.60 60.80 61.16 64.20 90.80
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=47.2 & bodyfat_1$Thigh<=57.60]="Low"
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=57.61 & bodyfat_1$Thigh<=64.20]="Midium"
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=64.21 & bodyfat_1$Thigh<=91]="High"
table(bodyfat_1$Thigh_cat)##
## High Low Midium
## 250 254 504
any(is.na(bodyfat_1$Thigh_cat))## [1] FALSE
visualization
bodyfat_1$Thigh_cat<-factor(bodyfat_1$Thigh_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Thigh_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Thigh variable #Normally distributed #Positively skewed #Leptokurtic
plot(density(bodyfat_1$Thigh))hist(bodyfat_1$Thigh)summary(bodyfat_1$Thigh)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 47.20 57.60 60.80 61.16 64.20 90.80
skew(bodyfat_1$Thigh)## [1] 0.7501523
kurtosi(bodyfat_1$Thigh)## [1] 2.304358
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Thigh)## [1] 0.5710217
Knee(create new categorical variable for Knee)
summary(bodyfat_1$Knee)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33.00 38.30 40.00 40.09 41.70 52.10
bodyfat_1$Knee_cat[bodyfat_1$Knee>=33 & bodyfat_1$Knee<=38.30]="Low"
bodyfat_1$Knee_cat[bodyfat_1$Knee>=38.31 & bodyfat_1$Knee<=41.70]="Midium"
bodyfat_1$Knee_cat[bodyfat_1$Knee>=41.71 & bodyfat_1$Knee<=53]="High"
table(bodyfat_1$Knee_cat)##
## High Low Midium
## 245 261 502
any(is.na(bodyfat_1$Knee_cat))## [1] FALSE
visualization
bodyfat_1$Knee_cat<-factor(bodyfat_1$Knee_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Knee_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Knee variable #Normally distributed #Positively skewed #Leptokurtic
plot(density(bodyfat_1$Knee))hist(bodyfat_1$Knee)summary(bodyfat_1$Knee)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33.00 38.30 40.00 40.09 41.70 52.10
skew(bodyfat_1$Knee)## [1] 0.3930333
kurtosi(bodyfat_1$Knee)## [1] 0.6793175
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Knee)## [1] 0.5151022
Ankle(create new categorical variable for Ankle)
summary(bodyfat_1$Ankle)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.1 23.3 24.5 24.6 25.6 36.9
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=19 & bodyfat_1$Ankle<=23.3]="Low"
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=23.4 & bodyfat_1$Ankle<=25.6]="Midium"
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=25.7 & bodyfat_1$Ankle<=37]="High"
table(bodyfat_1$Ankle_cat)##
## High Low Midium
## 251 257 500
any(is.na(bodyfat_1$Ankle_cat))## [1] FALSE
visualization
bodyfat_1$Ankle_cat<-factor(bodyfat_1$Ankle_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Ankle_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Ankle variable #Normally distributed #Positively skewed #Leptokurtic
plot(density(bodyfat_1$Ankle))hist(bodyfat_1$Ankle)summary(bodyfat_1$Ankle)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.1 23.3 24.5 24.6 25.6 36.9
skew(bodyfat_1$Ankle)## [1] 1.361183
kurtosi(bodyfat_1$Ankle)## [1] 5.923247
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Ankle)## [1] 0.2944401
Biceps(create new categorical variable for Biceps)
summary(bodyfat_1$Biceps)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.80 31.70 33.90 34.02 36.20 48.50
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=24.80 & bodyfat_1$Biceps<=31.70]="Low"
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=31.71 & bodyfat_1$Biceps<=36.20]="Midium"
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=36.21 & bodyfat_1$Biceps<=49]="High"
table(bodyfat_1$Biceps_cat)##
## High Low Midium
## 251 253 504
any(is.na(bodyfat_1$Biceps_cat))## [1] FALSE
visualization
bodyfat_1$Ankle_cat<-factor(bodyfat_1$Biceps_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Biceps_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Bicep variable #Normally distributed #Positively skewed #Leptokurtic
str(bodyfat_1)## 'data.frame': 1008 obs. of 28 variables:
## $ Density : num 1.07 1.09 1.04 1.08 1.03 ...
## $ BodyFat : num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
## $ Age : num 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : num 154 173 154 185 184 ...
## $ Height : num 67.8 72.2 66.2 72.2 71.2 ...
## $ Neck : num 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
## $ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
## $ Abdomen : num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
## $ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
## $ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
## $ Knee : num 37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
## $ Ankle : num 21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
## $ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
## $ Forearm : num 27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
## $ Wrist : num 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
## $ fat_value_cat: Ord.factor w/ 3 levels "Low body fat"<..: 1 1 2 1 3 2 2 1 1 1 ...
## $ Density_cat : Ord.factor w/ 3 levels "Low density"<..: 2 3 1 3 1 2 2 2 3 2 ...
## $ age_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 1 1 2 1 1 2 3 ...
## $ weight_cat : Ord.factor w/ 3 levels "Low weight"<"Midium weight"<..: 1 2 1 2 2 3 2 2 2 2 ...
## $ height_cat : Ord.factor w/ 3 levels "short"<"Midium"<..: 1 2 1 2 2 3 1 2 2 2 ...
## $ neck_cat : chr "Low" "Midium" "Low" "Low" ...
## $ chest_cat : Ord.factor w/ 3 levels "High"<"Low"<"Medium": 2 2 2 3 3 3 3 3 3 3 ...
## $ abdomen_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 2 2 2 2 1 2 ...
## $ Hip_cat : Ord.factor w/ 3 levels "Small"<"Midium"<..: 1 2 2 2 2 3 2 1 2 2 ...
## $ Thigh_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 2 2 2 2 3 2 2 2 2 ...
## $ Knee_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 3 3 1 2 1 2 ...
## $ Ankle_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 1 1 2 2 2 2 1 2 2 ...
## $ Biceps_cat : chr "Midium" "Low" "Low" "Midium" ...
plot(density(bodyfat_1$Biceps))summary(bodyfat_1$Biceps)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.80 31.70 33.90 34.02 36.20 48.50
skew(bodyfat_1$Biceps)## [1] 0.2234067
kurtosi(bodyfat_1$Biceps)## [1] 0.3088062
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Biceps)## [1] 0.5003333
Forearm(create new categorical variable for Forearm)
summary(bodyfat_1$Forearm)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 28.70 30.20 30.16 31.70 37.90
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=21 & bodyfat_1$Forearm<=28.70]="Low"
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=28.71 & bodyfat_1$Forearm<=31.70]="Midium"
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=31.71 & bodyfat_1$Forearm<=38]="High"
table(bodyfat_1$Forearm_cat)##
## High Low Midium
## 242 255 511
any(is.na(bodyfat_1$Forearm_cat))## [1] FALSE
visualization
bodyfat_1$Forearm_cat<-factor(bodyfat_1$Forearm_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Forearm_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Forearm variable #Normally distributed #Negatively skewed #Leptokurtic
str(bodyfat_1)## 'data.frame': 1008 obs. of 29 variables:
## $ Density : num 1.07 1.09 1.04 1.08 1.03 ...
## $ BodyFat : num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
## $ Age : num 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : num 154 173 154 185 184 ...
## $ Height : num 67.8 72.2 66.2 72.2 71.2 ...
## $ Neck : num 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
## $ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
## $ Abdomen : num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
## $ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
## $ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
## $ Knee : num 37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
## $ Ankle : num 21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
## $ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
## $ Forearm : num 27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
## $ Wrist : num 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
## $ fat_value_cat: Ord.factor w/ 3 levels "Low body fat"<..: 1 1 2 1 3 2 2 1 1 1 ...
## $ Density_cat : Ord.factor w/ 3 levels "Low density"<..: 2 3 1 3 1 2 2 2 3 2 ...
## $ age_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 1 1 2 1 1 2 3 ...
## $ weight_cat : Ord.factor w/ 3 levels "Low weight"<"Midium weight"<..: 1 2 1 2 2 3 2 2 2 2 ...
## $ height_cat : Ord.factor w/ 3 levels "short"<"Midium"<..: 1 2 1 2 2 3 1 2 2 2 ...
## $ neck_cat : chr "Low" "Midium" "Low" "Low" ...
## $ chest_cat : Ord.factor w/ 3 levels "High"<"Low"<"Medium": 2 2 2 3 3 3 3 3 3 3 ...
## $ abdomen_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 2 2 2 2 1 2 ...
## $ Hip_cat : Ord.factor w/ 3 levels "Small"<"Midium"<..: 1 2 2 2 2 3 2 1 2 2 ...
## $ Thigh_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 2 2 2 2 3 2 2 2 2 ...
## $ Knee_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 3 3 1 2 1 2 ...
## $ Ankle_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 1 1 2 2 2 2 1 2 2 ...
## $ Biceps_cat : chr "Midium" "Low" "Low" "Midium" ...
## $ Forearm_cat : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 2 1 2 1 2 2 2 ...
plot(density(bodyfat_1$Forearm))summary(bodyfat_1$Forearm)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 28.70 30.20 30.16 31.70 37.90
skew(bodyfat_1$Forearm)## [1] -0.1509276
kurtosi(bodyfat_1$Forearm)## [1] 0.4527081
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Forearm)## [1] 0.3792227
Wrist(create new categorical variable for Wrist)
summary(bodyfat_1$Wrist)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.80 18.57 19.50 19.48 20.30 23.90
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=15 & bodyfat_1$Wrist<=18.57]="Low"
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=18.58 & bodyfat_1$Wrist<=20.30]="Midium"
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=20.31 & bodyfat_1$Wrist<=24]="High"
table(bodyfat_1$Wrist_cat)##
## High Low Midium
## 249 252 507
any(is.na(bodyfat_1$Wrist_cat))## [1] FALSE
visualization
bodyfat_1$Wrist_cat<-factor(bodyfat_1$Wrist_cat, ordered = TRUE, levels = c("Low", "Midium","High"))
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Wrist_cat))+geom_bar(position = position_dodge(preserve = "single"))#Univariate Distribution for Wrist variable #Normally distributed #Positively skewed #Platykurtic
hist(bodyfat_1$Wrist)summary(bodyfat_1$Wrist)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.80 18.57 19.50 19.48 20.30 23.90
skew(bodyfat_1$Wrist)## [1] 0.1037711
kurtosi(bodyfat_1$Wrist)## [1] -0.1767461
check for correlation #positive correlation
cor(bodyfat_1$BodyFat, bodyfat_1$Wrist)## [1] 0.3427374
___Returns:Check for bias Choose a demographic variable that can identify the type of participants (Age) #Value of CI is very close to 0 = somewhat balance
table(bodyfat_1$age_cat)##
## Low Midium High
## 258 503 247
summary(bodyfat_1$Age)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 38.00 45.00 46.38 55.00 84.00
bodyfat_1$age_cat_2[bodyfat_1$Age>=22 & bodyfat_1$Age<=46.38]<-"young"
bodyfat_1$age_cat_2[bodyfat_1$Age>=46.39 & bodyfat_1$Age<=84]<-"old"
table(bodyfat_1$age_cat_2)##
## old young
## 463 545
na<-463
nb<-545
CI<-(na-nb)/(na+nb)
CI## [1] -0.08134921
___Returns: Correlation matrix Convert all variavles from bodyfat dataset to numerical variables
bodyfat$Density <- as.numeric(bodyfat$Density)
bodyfat$BodyFat<- as.numeric(bodyfat$BodyFat)
bodyfat$Age<- as.numeric(bodyfat$Age)
bodyfat$Weight<-as.numeric(bodyfat$Weight)
bodyfat$Height<-as.numeric(bodyfat$Height)
bodyfat$Neck<-as.numeric(bodyfat$Neck)
bodyfat$Chest<-as.numeric(bodyfat$Chest)
bodyfat$Abdomen<-as.numeric(bodyfat$Abdomen)
bodyfat$Hip<-as.numeric(bodyfat$Hip)
bodyfat$Thigh<-as.numeric(bodyfat$Thigh)
bodyfat$Knee<-as.numeric(bodyfat$Knee)
bodyfat$Ankle<-as.numeric(bodyfat$Ankle)
bodyfat$Biceps<-as.numeric(bodyfat$Biceps)
bodyfat$Forearm<-as.numeric(bodyfat$Forearm)
bodyfat$Wrist<-as.numeric(bodyfat$Wrist)
str(bodyfat)## 'data.frame': 1008 obs. of 15 variables:
## $ Density: num 1.07 1.09 1.04 1.08 1.03 ...
## $ BodyFat: num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
## $ Age : num 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : num 154 173 154 185 184 ...
## $ Height : num 67.8 72.2 66.2 72.2 71.2 ...
## $ Neck : num 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
## $ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
## $ Abdomen: num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
## $ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
## $ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
## $ Knee : num 37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
## $ Ankle : num 21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
## $ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
## $ Forearm: num 27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
## $ Wrist : num 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
___Returns: Plot correlation matrix
library("Hmisc")## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following object is masked from 'package:plotly':
##
## subplot
## The following objects are masked from 'package:base':
##
## format.pval, units
library("corrplot")## corrplot 0.92 loaded
bodyfat.cor<-cor(bodyfat)
corrplot(bodyfat.cor,method = "number", number.cex = 0.6)___Returns: Linear Regression Building model_1(include all variables) P Value: < 2.2e-16, highly significant, which means at least one of the predictor variables is significantly related to the outcome variable(BodyFat).
model_1<-lm(BodyFat~Density+Age+Weight+Height+Neck+Chest+Abdomen+Hip+Thigh+Knee+Ankle+Biceps+Forearm+Wrist, data = bodyfat)
summary(model_1)##
## Call:
## lm(formula = BodyFat ~ Density + Age + Weight + Height + Neck +
## Chest + Abdomen + Hip + Thigh + Knee + Ankle + Biceps + Forearm +
## Wrist, data = bodyfat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.9541 -0.8261 -0.0661 0.7261 15.3108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.812e+02 5.996e+00 63.578 < 2e-16 ***
## Density -3.968e+02 5.285e+00 -75.078 < 2e-16 ***
## Age -8.628e-03 6.161e-03 -1.400 0.161722
## Weight -1.458e-01 6.806e-03 -21.423 < 2e-16 ***
## Height 9.880e-02 1.770e-02 5.582 3.07e-08 ***
## Neck 1.630e-01 4.242e-02 3.844 0.000129 ***
## Chest 9.513e-02 1.875e-02 5.075 4.63e-07 ***
## Abdomen 1.057e-01 2.015e-02 5.246 1.90e-07 ***
## Hip 1.925e-01 2.604e-02 7.395 3.00e-13 ***
## Thigh 2.423e-02 2.727e-02 0.889 0.374464
## Knee 2.040e-01 4.506e-02 4.526 6.72e-06 ***
## Ankle 1.121e-01 4.122e-02 2.719 0.006654 **
## Biceps 4.691e-02 3.220e-02 1.457 0.145570
## Forearm 8.500e-02 3.853e-02 2.206 0.027596 *
## Wrist 8.928e-01 9.040e-02 9.876 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.654 on 993 degrees of freedom
## Multiple R-squared: 0.9621, Adjusted R-squared: 0.9616
## F-statistic: 1802 on 14 and 993 DF, p-value: < 2.2e-16
___Returns:Create prediction Equation for Regression Model (pred1)
bodyfat$bodyfat_pred_1=(-396.8*(bodyfat$Density)) + (-0.008628*(bodyfat$Age)) + (-0.1458*(bodyfat$Weight)) + (0.0988*(bodyfat$Height)) + (0.163*(bodyfat$Neck)) +(0.09513*(bodyfat$Chest)) + (0.1057*(bodyfat$Abdomen)) + (0.1925*(bodyfat$Hip)) +(0.02423*(bodyfat$Thigh)) + (0.204*(bodyfat$Knee)) + (0.1121*(bodyfat$Ankle)) +
(0.04691*(bodyfat$Biceps)) + (0.085*(bodyfat$Forearm)) + (0.8928*(bodyfat$Wrist)+381.2)
head(bodyfat)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist | bodyfat_pred_1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59.0 | 37.3 | 21.9 | 32.0 | 27.4 | 17.1 | 12.857019 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83.0 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 | 6.984968 |
| 1.0414 | 25.3 | 22 | 154.00 | 66.25 | 34.0 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24.0 | 28.8 | 25.2 | 16.6 | 25.301044 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 | 10.860803 |
| 1.0340 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100.0 | 101.9 | 63.2 | 42.2 | 24.0 | 32.2 | 27.7 | 17.7 | 28.424775 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39.0 | 104.5 | 94.4 | 107.8 | 66.0 | 42.0 | 25.6 | 35.7 | 30.6 | 18.8 | 22.129350 |
___Returns:Building model_2(with no outliers) #Check for outliers
cooksd<-cooks.distance(model_1)
#plot the cook's distance
sample_size<-nrow(bodyfat)
plot(cooksd)
abline(h=8/sample_size, col="red")
#Add cutoff line
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>8/sample_size, names(cooksd),""),col="red")___Returns:Remove outliers
top_x_outliers<-9
influential<-as.numeric(names(sort(cooksd, decreasing = TRUE)[1:top_x_outliers]))
#subset dataset without outliers
dataframe_no_outliers<-bodyfat[-influential, ]___Returns:Build linear regression model
model_no_outlier<-lm(BodyFat~Density+Weight+Height+Neck+Chest+Abdomen+Hip+Knee+Wrist,data =dataframe_no_outliers)
summary(model_no_outlier) ##
## Call:
## lm(formula = BodyFat ~ Density + Weight + Height + Neck + Chest +
## Abdomen + Hip + Knee + Wrist, data = dataframe_no_outliers)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.8949 -0.7679 0.0254 0.7601 6.2699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 404.55891 4.71200 85.857 < 2e-16 ***
## Density -427.16424 4.13369 -103.337 < 2e-16 ***
## Weight -0.16284 0.00541 -30.099 < 2e-16 ***
## Height 0.28599 0.02236 12.793 < 2e-16 ***
## Neck 0.29722 0.03150 9.434 < 2e-16 ***
## Chest 0.10374 0.01441 7.200 1.19e-12 ***
## Abdomen 0.06570 0.01459 4.504 7.46e-06 ***
## Hip 0.25343 0.01795 14.120 < 2e-16 ***
## Knee 0.15182 0.03362 4.516 7.08e-06 ***
## Wrist 0.88570 0.06112 14.492 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.289 on 989 degrees of freedom
## Multiple R-squared: 0.9767, Adjusted R-squared: 0.9765
## F-statistic: 4614 on 9 and 989 DF, p-value: < 2.2e-16
___Returns:Create prediction Equation for Regression Model (pred2: without outliers)
bodyfat$bodyfat_pred_2=(-427.1642*(bodyfat$Density)) + (-0.16284*(bodyfat$Weight)) + (0.28599*(bodyfat$Height)) + (0.29722*(bodyfat$Neck)) +(0.10374*(bodyfat$Chest)) + (0.06570*(bodyfat$Abdomen)) + (0.25343*(bodyfat$Hip)) + (0.15182*(bodyfat$Knee)) + (0.88570*(bodyfat$Wrist)+404.55891)
head(bodyfat)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist | bodyfat_pred_1 | bodyfat_pred_2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59.0 | 37.3 | 21.9 | 32.0 | 27.4 | 17.1 | 12.857019 | 12.181926 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83.0 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 | 6.984968 | 6.810652 |
| 1.0414 | 25.3 | 22 | 154.00 | 66.25 | 34.0 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24.0 | 28.8 | 25.2 | 16.6 | 25.301044 | 25.147066 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 | 10.860803 | 10.675748 |
| 1.0340 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100.0 | 101.9 | 63.2 | 42.2 | 24.0 | 32.2 | 27.7 | 17.7 | 28.424775 | 28.041126 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39.0 | 104.5 | 94.4 | 107.8 | 66.0 | 42.0 | 25.6 | 35.7 | 30.6 | 18.8 | 22.129350 | 22.073554 |
___Returns:building model_3 Check for variable importance
#prepate training scheme
library(varImp)## Loading required package: measures
##
## Attaching package: 'measures'
## The following object is masked from 'package:psych':
##
## AUC
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
##
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
##
## boundary
library(caret)##
## Attaching package: 'caret'
## The following object is masked from 'package:varImp':
##
## varImp
## The following objects are masked from 'package:measures':
##
## MAE, RMSE
## The following object is masked from 'package:survival':
##
## cluster
## The following object is masked from 'package:purrr':
##
## lift
str(bodyfat)## 'data.frame': 1008 obs. of 17 variables:
## $ Density : num 1.07 1.09 1.04 1.08 1.03 ...
## $ BodyFat : num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
## $ Age : num 23 22 22 26 24 24 26 25 25 23 ...
## $ Weight : num 154 173 154 185 184 ...
## $ Height : num 67.8 72.2 66.2 72.2 71.2 ...
## $ Neck : num 36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
## $ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
## $ Abdomen : num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
## $ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
## $ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
## $ Knee : num 37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
## $ Ankle : num 21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
## $ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
## $ Forearm : num 27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
## $ Wrist : num 17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
## $ bodyfat_pred_1: num 12.86 6.98 25.3 10.86 28.42 ...
## $ bodyfat_pred_2: num 12.18 6.81 25.15 10.68 28.04 ...
control<-trainControl(method="repeatedcv", number=10, repeats=3)
#train control model
lm_model<-train(BodyFat~.,data=dataframe_no_outliers, method="lm", preProcess="scale", trControl= control)## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
#estimate variables importance
importance<- varImp(lm_model, scale=FALSE)
#summarize importance
print(importance)## lm variable importance
##
## Overall
## Density 105.926
## Weight 33.002
## Height 13.106
## Wrist 10.653
## Hip 9.685
## Abdomen 8.044
## Neck 7.509
## Chest 6.617
## Ankle 6.389
## Biceps 5.025
## Age 2.917
## Thigh 2.374
## Knee 1.995
## Forearm 1.175
plot(importance)___Returns:building model_3(only with variables show importance to the outcome variable Bodyfat)
model_3<-lm(BodyFat~Density+Weight+Height,data = bodyfat)
summary(model_3)##
## Call:
## lm(formula = BodyFat ~ Density + Weight + Height, data = bodyfat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9136 -1.7808 -0.0367 1.7259 16.8781
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.331e+02 6.143e+00 70.507 < 2e-16 ***
## Density -4.029e+02 5.855e+00 -68.819 < 2e-16 ***
## Weight 1.672e-02 3.993e-03 4.188 3.07e-05 ***
## Height 1.631e-01 2.504e-02 6.514 1.15e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.664 on 1004 degrees of freedom
## Multiple R-squared: 0.9006, Adjusted R-squared: 0.9003
## F-statistic: 3033 on 3 and 1004 DF, p-value: < 2.2e-16
___Returns:Create prediction Equation for Regression Model (pred3)
bodyfat$bodyfat_pred_3=(-424.7*(bodyfat$Density)) + (-0.03045*(bodyfat$Weight)) + (0.1631*(bodyfat$Height)+440.4)
head(bodyfat)| Density | BodyFat | Age | Weight | Height | Neck | Chest | Abdomen | Hip | Thigh | Knee | Ankle | Biceps | Forearm | Wrist | bodyfat_pred_1 | bodyfat_pred_2 | bodyfat_pred_3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.0708 | 12.3 | 23 | 154.25 | 67.75 | 36.2 | 93.1 | 85.2 | 94.5 | 59.0 | 37.3 | 21.9 | 32.0 | 27.4 | 17.1 | 12.857019 | 12.181926 | -8.0156475 |
| 1.0853 | 6.1 | 22 | 173.25 | 72.25 | 38.5 | 93.6 | 83.0 | 98.7 | 58.7 | 37.3 | 23.4 | 30.5 | 28.9 | 18.2 | 6.984968 | 6.810652 | -14.0183975 |
| 1.0414 | 25.3 | 22 | 154.00 | 66.25 | 34.0 | 95.8 | 87.9 | 99.2 | 59.6 | 38.9 | 24.0 | 28.8 | 25.2 | 16.6 | 25.301044 | 25.147066 | 4.2334950 |
| 1.0751 | 10.4 | 26 | 184.75 | 72.25 | 37.4 | 101.8 | 86.4 | 101.2 | 60.1 | 37.3 | 22.8 | 32.4 | 29.4 | 18.2 | 10.860803 | 10.675748 | -10.0366325 |
| 1.0340 | 28.7 | 24 | 184.25 | 71.25 | 34.4 | 97.3 | 100.0 | 101.9 | 63.2 | 42.2 | 24.0 | 32.2 | 27.7 | 17.7 | 28.424775 | 28.041126 | 7.2706625 |
| 1.0502 | 20.9 | 24 | 210.25 | 74.75 | 39.0 | 104.5 | 94.4 | 107.8 | 66.0 | 42.0 | 25.6 | 35.7 | 30.6 | 18.8 | 22.129350 | 22.073554 | 0.1696725 |
___Returns:Plot to see if the models meet linear regression assumptions, compare them and choose the best model
Plot model_1 #Residual standard error: 1.654 on 993 degrees of freedom #Multiple R-squared: 0.9621
par(mfrow = c(2, 2))
plot(model_1)___Returns:Plot model_2 #Residual standard error: 1.289 on 989 degrees of freedom #Multiple R-squared: 0.9767
par(mfrow = c(2, 2))
plot(model_no_outlier)___Returns:Plot model_3
#Residual standard error: 2.664 on 1004 degrees of freedom #Multiple R-squared: 0.9006, Adjusted R-squared: 0.9003
par(mfrow = c(2, 2))
plot(model_3)Hence, we performed various analysis on our dataset. Analysis involved restructuring data types, creating new categorical variables, performing univariate distribution, check for correlation on all the variables, check for bias, performed linear regression by building model 1, model 2, model 3, creating prediction equation, checking for outliers, check for importance for different model and compared models for residual standard error and R-Squared.
For project use Project Presentation
Project